About the Data

The Office of Spill Prevention and Response (OSPR) runs a statewide oil spill tracking information system called the Incident Tracking Database. An “incident”, for purposes of this database, is “a discharge or threatened discharge of petroleum or other deleterious material into the waters of the state.” The data are collected by OSPR Field Response Team members for Marine oil spills and by OSPR Inland Pollution Coordinators and Wardens for Inland incidents.

We’ll explore the data collected in 2008 to explore the spatial distribution of incidents across the state.

California counties

First, we’ll load the geographic boundaries of the state as a background layer.

# Read in data with read_sf()
ca_counties <- read_sf(here("ca_counties","CA_Counties_TIGER2016.shp"))

# Create subset with just name and land area - spatial info will *stick*
ca_subset <- ca_counties %>% 
  select(NAME, ALAND) %>% 
  rename(county_name = NAME, land_area = ALAND)

# Check CRS
#ca_subset %>% st_crs()

# Plot using geom_sf()
ggplot(data = ca_subset) +
  geom_sf(aes(fill = land_area), color = "white", size = 0.1, alpha = 0.5) +
  theme_void() +
  scale_fill_gradientn(colors = c("cyan","blue","purple"))

The coordinate reference system (CRS) used for this dataset is: "WGS 84 / Pseudo-Mercator

Oil Spill Incidence

Next, we’ll load the oil spill data from 2008.

It was important to note that this dataset uses a different coordinate reference system (CRS) (“NAD83 / California Albers”) from the state spatial data, so we needed to update it so they match (both “EPSG:3857”).

Then, we can plot them together! The interactive map below allows you to investigate the location of each incident that took place in 2008:

#ggplot() +
#  geom_sf(data = ca_subset) +
#  geom_sf(data = new_spill_data, size = 1, color = "red")

# Set viewing mode to "interactive":
tmap_mode(mode = "view")

tm_shape(ca_subset) +
  tm_fill("land_area", palette = "BuGn") +
  tm_shape(new_spill_data) +
  tm_dots()

By County

We might be interested in seeing which counties experienced the most incidents that year. We can use the powerful function st_join() to spatially join our datasets so that the spills can be sorted by county. We’ll focus on inland events only, and color-code by incident frequency:

spill_by_county <- ca_subset %>% 
  st_join(new_spill_data) %>% # first, spatial join
  filter(INLANDMARI == "Inland") %>% # then, filter for inland events
  count(county_name)

ggplot(data = spill_by_county) +
  geom_sf(aes(fill = n), color = "white", size = 0.1) +
  scale_fill_gradientn(colors = c("lightgray","orange","red")) +
  theme_minimal() +
  labs(fill = "Number of spills")


Data Sources:

  1. Oil Spill Incident Tracking [ds394] California Department of Fish and Game, Office of Spill Prevention and Response. Published July 23, 2009. Available at https://map.dfg.ca.gov/metadata/ds0394.html

  2. CA Geographic Boundaries California Department of Technology using US Census Bureau’s 2016 MAF/TIGER database. Last updated October 23, 2019. Available at https://data.ca.gov/dataset/ca-geographic-boundaries